Adaptive hybrid methods for Feature selection based on Aggregation of Information gain and Clustering methods
نویسندگان
چکیده
The growing abundance of information necessitates the need for appropriate methods for organization and evaluation. Mining data for information and extracting conclusions has been a fertile field of research. However data mining needs methods to preprocess the data. Feature selection is a growing field of interest about selecting proper information from information repositories. The aim of this paper is to highlight the need for feature selection methods in data mining encompassing the best characteristics of the data. In recent times there has been interest in developing hybrid feature selection methods combining the characteristics of various filter and wrapper methods. The proposed method advocates an adaptive aggregation strategy using a) the gain ratio for candidate features and b) clustering methods to find the distribution of candidate features. The underlying principle of the strategy is that the best individual features need not constitute the best sub-set of features representing the problem. A given feature might provide more information when present with certain other feature(s) than when considered by itself. The adaptive method has been implemented for the datasets from the UCI repository and results correlated. The conclusions show that the proposed method shows encouraging results.
منابع مشابه
A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملIntrusion Detection based on a Novel Hybrid Learning Approach
Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper ...
متن کاملModeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification
Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کامل